[Draft] Add compression ratio calculation and per-column compression stats (#18184) by johnsolomonj · Pull Request #18185 · apache/pinot

johnsolomonj · 2026-04-13T18:59:50Z

Labels: feature, release-notes, observability

Summary

Draft implementation for the PEP proposed in #18184. Kept as draft pending design review on the issue.

Adds compression ratio tracking and per-column compression stats to Pinot's existing table size and metadata APIs:

Track uncompressed forward index sizes at write time in all raw column writers (BaseChunkForwardIndexWriter subclasses, VarByteChunkForwardIndexWriterV4/V5/V6, CLPForwardIndexCreatorV2)
Persist uncompressed size and compression codec to metadata.properties per column
Expose compressionStats, columnCompressionStats, and storageBreakdown on both GET /tables/{table}/size and GET /tables/{table}/metadata
Add TABLE_COMPRESSION_RATIO_PERCENT and TABLE_TIERED_STORAGE_SIZE controller gauges with tier lifecycle management
Gated by table-level indexingConfig.compressionStatsEnabled flag (default: off, zero overhead when disabled)

Design document

See #18184 for the full PEP including motivation, prior art, API response structure, and known corner cases.

Key design decisions

Per-value tracking: Uncompressed size tracked at individual put*() callsites, capturing raw ingested data size without chunk headers or alignment padding
Shared codec resolution: ForwardIndexType.resolveCompressionType() handles CLP codec variants, used by both BaseSegmentCreator and ForwardIndexHandler
Dict columns out of scope: Dictionary-encoded columns report -1 for uncompressed size since writers only see dictionary IDs
Backward compatible: New metadata fields are additive; old segments gracefully return defaults

Test plan

Unit tests for writer uncompressed size tracking (fixed-byte, var-byte V1-V3, V4/V5/V6)
Unit tests for CLP V2 sub-stream size aggregation
Unit tests for ForwardIndexType.resolveCompressionType() codec resolution
Unit tests for ForwardIndexHandler compression stats persistence on reload
Controller aggregation tests (dict sentinel preservation, negative ratio guards, partial coverage)
Integration test for end-to-end compression stats API response
Verify zero overhead when compressionStatsEnabled = false

codecov-commenter · 2026-04-13T19:51:35Z

Codecov Report

❌ Patch coverage is 57.87234% with 99 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.82%. Comparing base (e76da0e) to head (0741c48).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
...local/segment/creator/impl/BaseSegmentCreator.java	4.16%	22 Missing and 1 partial ⚠️
...ent/creator/impl/fwd/CLPForwardIndexCreatorV2.java	0.00%	19 Missing ⚠️
...ocal/segment/index/loader/ForwardIndexHandler.java	29.16%	15 Missing and 2 partials ⚠️
.../local/segment/index/forward/ForwardIndexType.java	0.00%	8 Missing ⚠️
...ment/index/forward/ForwardIndexCreatorFactory.java	68.18%	4 Missing and 3 partials ⚠️
...ocal/segment/creator/impl/ColumnIndexCreators.java	0.00%	5 Missing ⚠️
...inot/segment/spi/creator/IndexCreationContext.java	0.00%	5 Missing ⚠️
.../writer/impl/FixedByteChunkForwardIndexWriter.java	50.00%	2 Missing and 2 partials ⚠️
...ot/segment/spi/creator/SegmentGeneratorConfig.java	25.00%	3 Missing ⚠️
...gment/local/io/writer/impl/VarByteChunkWriter.java	0.00%	2 Missing ⚠️
... and 4 more

❗ There is a different number of reports uploaded between BASE (e76da0e) and HEAD (0741c48). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (e76da0e) HEAD (0741c48)

unittests2 1 0

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18185      +/-   ##
============================================
- Coverage     63.73%   55.82%   -7.91%     
+ Complexity     1932      823    -1109     
============================================
  Files          3292     2568     -724     
  Lines        201503   148791   -52712     
  Branches      31320    24043    -7277     
============================================
- Hits         128429    83065   -45364     
+ Misses        62794    58624    -4170     
+ Partials      10280     7102    -3178

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (ø)`
integration	`100.00% <ø> (ø)`
integration1	`100.00% <ø> (ø)`
integration2	`0.00% <ø> (ø)`
java-21	`55.82% <57.87%> (-7.91%)`	⬇️
temurin	`55.82% <57.87%> (-7.91%)`	⬇️
unittests	`55.82% <57.87%> (-7.91%)`	⬇️
unittests1	`55.82% <57.87%> (+0.04%)`	⬆️
unittests2	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…to table size API This feature enables tracking and reporting of forward index compression effectiveness across Pinot segments. When `compressionStatsEnabled` is set in table config's indexing config, segment creation records uncompressed forward index sizes and compression codec in metadata.properties. The server-side table size endpoint now returns per-segment and per-column raw/compressed forward index sizes. The controller aggregates these into table-level compression ratio metrics (raw/compressed), with partial coverage tracking for mixed-version clusters. Three new ControllerGauge metrics (TABLE_COMPRESSION_RATIO_PERCENT, TABLE_RAW_FORWARD_INDEX_SIZE_PER_REPLICA, TABLE_COMPRESSED_FORWARD_INDEX_SIZE_PER_REPLICA) are emitted for monitoring. ForwardIndexHandler is updated to persist compression metadata during segment reload operations (compression type change and dict-to-raw conversion).

…feature - Add 6 new test files covering writer-level tracking, segment creation, corner cases, ForwardIndexHandler reload, and integration tests for both offline and realtime (Kafka) ingestion paths - Merge redundant dual-loop in TableSizeReader into a single pass over server info, improving performance during table size aggregation - Fix offline integration test teardown to properly wait for table data manager removal before stopping servers - Wrap second table cleanup in offline test in finally block to prevent resource leaks on assertion failure

…tier breakdown, and stale metadata cleanup - Wrap flat compression fields in nested CompressionStats DTO with @JsonInclude(NON_NULL) - Add StorageBreakdown with per-tier segment count and size (always reported) - Add per-column ColumnCompressionDetail with aggregated sizes, ratio, and codec (MIXED when codecs differ across segments) - Gate compressionStats on tableConfig.indexingConfig.compressionStatsEnabled; suppress from JSON when OFF - Fix isPartialCoverage: now correctly returns true when 0 segments have stats but non-missing segments exist - Clear stale forwardIndex.compressionCodec and forwardIndex.uncompressedSizeBytes on raw-to-dict reload - Support null values in SegmentMetadataUtils.updateMetadataProperties to clear properties - Add TABLE_TIERED_STORAGE_SIZE gauge; emit tier metrics always; clear compression+tier gauges when flag OFF - Add testRawToDictClearsCompressionStats, testCompressionStatsNullWhenFlagOff, per-column/tier assertions - Update integration tests for nested compressionStats JSON structure

…API, and comprehensive tests - Gate uncompressed size tracking in forward index writers via compressionStatsEnabled flag (ForwardIndexCreatorFactory, ForwardIndexHandler, all raw index creators) - Add per-column compression stats aggregation to server TablesResource and ServerSegmentMetadataReader with MIXED codec detection - Extend TableMetadataInfo DTO with columnCompressionStats field (NON_NULL suppression) - Fix integration test schema name mismatch for disabled-stats table - Add 7 new test classes: IndexingConfigCompressionFlagTest, SegmentGeneratorConfigPropagationTest, CLPForwardIndexCreatorV2StatsTest, ServerTableSizeReaderRawBytesTest, TableMetadataReaderCompressionTest, TableMetadataInfoCompressionTest, ForwardIndexHandlerCompressionStatsTest updates

…feature flag - Replace Map<String, ColumnCompressionStatsInfo> with List<ColumnCompressionStatsInfo> array containing all required fields: column, uncompressedSizeInBytes, compressedSizeInBytes, compressionRatio, codec, hasDictionary, indexes - Gate columnCompressionStats in both server endpoints (TablesResource metadata, TableSizeResource size) on compressionStatsEnabled feature flag - Add controller-side suppression in PinotTableRestletResource for safety against old servers that may still emit stats when flag is off - Fix forward index size accumulation: use getIndexSizeFor(StandardIndexes.forward()) directly per segment instead of cumulative variable - Sort columnCompressionStats array by column name for deterministic output - Update all tests and DTOs for the new array schema

…, and metadata endpoint gaps Resolve default compression codec (LZ4/PASS_THROUGH) in BaseSegmentCreator and ForwardIndexHandler when table config leaves chunkCompressionType null. Include dictionary-encoded columns in columnCompressionStats with hasDictionary=true. Clear stale controller compression metrics when no segments report stats. Suppress zeroed compressionStats for dict-only tables. Add compressionStats summary to the metadata endpoint aggregated from per-column data. Add tests for all fixes.

…ession - Move columnCompressionStats to top-level field on TableSubTypeSizeDetails instead of nesting inside CompressionStats inner class - Remove unused ColumnCompressionDetail inner class; use shared ColumnCompressionStatsInfo DTO from pinot-common - Fix metadata endpoint field names to match size endpoint: rawForwardIndexSizePerReplicaInBytes, compressedForwardIndexSizePerReplicaInBytes - Suppress compressionStats and columnCompressionStats for dict-only tables and when feature flag is OFF - Dictionary columns report -1 for uncompressedSizeInBytes to distinguish from zero-size raw columns on both size and metadata endpoints - Use LinkedHashSet for index deduplication in per-column aggregation

…adata endpoint - Create CompressionStatsSummary DTO in pinot-common with rawForwardIndexSizePerReplicaInBytes, compressedForwardIndexSizePerReplicaInBytes, and compressionRatio - Create StorageBreakdownInfo DTO in pinot-common with per-tier count and size - Add both as @JsonInclude(NON_NULL) fields on TableMetadataInfo with backward-compatible constructors - Server computes compressionStats summary and storageBreakdown during segment iteration and includes them in TableMetadataInfo response - Controller aggregates both fields across servers in ServerSegmentMetadataReader (divides by numReplica like other fields) - Remove manual addCompressionStatsSummary() JSON manipulation from PinotTableRestletResource; controller now just strips fields when flag is OFF - Fix use-after-release: tier accumulation moved inside try block before segments are released in the finally block

…racking accuracy - Remove ObjectNode.remove() pattern from PinotTableRestletResource; pass compressionStatsEnabled flag through TableMetadataReader and ServerSegmentMetadataReader so DTOs are constructed with null at creation time when the flag is OFF (storageBreakdown always preserved) - Add segmentsWithStats, totalSegments, isPartialCoverage to CompressionStatsSummary so metadata endpoint has identical JSON schema to the size endpoint; populate during server and controller aggregation - Fix VarByteChunkForwardIndexWriterV4 uncompressed size tracking: track raw value byte lengths in putBytes() instead of buffer.remaining() in write() which included chunk-format header overhead - Move segment stats counting inside the try block in TablesResource to avoid accessing segment metadata after segments are released - Add test for compression stats suppression when flag is disabled

Old segments lacking uncompressed size metadata (pre-flag segments) were contributing their compressed forward index size to the denominator while adding nothing to the numerator, deflating the compression ratio. Now only dictionary-encoded columns enter the compressed-only accumulation path; raw columns on old segments without codec/uncompressed data are skipped entirely from both numerator and denominator.

- Track uncompressed size per value in putInt/putLong/putFloat/putDouble (FixedByteChunkForwardIndexWriter) and putBytes (VarByteChunkForwardIndexWriter) instead of using chunk buffer remaining bytes which overcounts due to chunk-internal offset tables - Gate server size endpoint compression field collection on feature flag so no metadata access or accumulation occurs when flag is OFF - Exclude old raw segments lacking a persisted compression codec from per-column stats to prevent sentinel values leaking into aggregation - Guard Math.max in per-column accumulation against INDEX_NOT_FOUND (-1) sentinel values - Preserve per-column compression stats for dict-only tables even when no segments have raw forward index data (segmentsWithStats == 0) - Rename TABLE_COMPRESSION_RATIO_HUNDREDTHS gauge to TABLE_COMPRESSION_RATIO_PERCENT for consistency - Hoist IndexService.getInstance() outside per-column inner loop

- Fix negative compression ratio for dict columns in metadata endpoint: require both uncompressed > 0 and compressed > 0 before computing ratio in ServerSegmentMetadataReader (per-column and summary level) - Track emitted tier gauge keys per table in TableSizeReader so stale tier-suffixed gauges (tableName.tierKey) are removed when tiers disappear or table is deleted via SegmentStatusChecker cleanup - Resolve CLP V2 actual compression type (ZSTANDARD) from the CompressionCodec before falling back to ForwardIndexConfig's chunkCompressionType, which maps all CLP variants to PASS_THROUGH

Extract resolveCompressionType() as a shared utility in ForwardIndexType that correctly maps CLP codec variants to their actual compression types (CLPV2/CLPV2_ZSTD → ZSTANDARD, CLPV2_LZ4 → LZ4, CLP → PASS_THROUGH). This fixes ForwardIndexHandler using incorrect compression types during codec changes and dict-to-raw conversions. BaseSegmentCreator now uses the same shared method, handling nullable fieldType for schema evolution cases. Also document CLPForwardIndexCreatorV2.getUncompressedSize() semantics: returns pre-compression sub-stream byte total, not original UTF-8 length.

…umns Size endpoint flag-OFF path now uses the 5-arg SegmentSizeInfo constructor to pass through tier information, fixing storageBreakdown flattening all segments into the "default" tier when compression stats are disabled. Metadata endpoint aggregation now skips columns from old raw segments that have no persisted compression codec and no dictionary, preventing zero-filled entries from appearing in per-column compression stats.

Dictionary columns report -1 as uncompressed forward index size. When aggregating across replicas, skip accumulation for negative sentinels (using >= 0 guard) and reconstruct -1 in the output for dict columns that have no real uncompressed data. Also guard compressionStatsSummary construction on segmentsWithStats > 0 after replica division, avoiding a degenerate summary when integer division rounds the count to zero.

…stats Tier gauge emission was passing the tier-suffixed key (e.g. "myTable_OFFLINE.coldTier") to isLeaderForTable(), which expects the base table name. Hoist the leader check to use the canonical tableNameWithType before the tier loop, then emit gauges directly. On the server metadata endpoint, move computeIfAbsent for per-column compression accumulators inside the conditional branches so old raw segments without a persisted codec no longer create zero-filled entries.

- Reorder FieldConfig import in BaseSegmentCreator to satisfy spotless - Replace ColumnMetadata.INDEX_NOT_FOUND with UNAVAILABLE (upstream rename) - Replace deprecated RandomStringUtils.randomAlphanumeric with secure().nextAlphanumeric

…Summary - ColumnMetadataImplTest: add two tests verifying the new compression stats fields (FORWARD_INDEX_UNCOMPRESSED_SIZE, FORWARD_INDEX_COMPRESSION_CODEC) round-trip through fromPropertiesConfiguration, and that old segments without those keys return UNAVAILABLE/-1 and null respectively. - CompressionStatsSummaryTest: new test class covering all getters, full/partial coverage flag, JSON round-trip, and unknown-field backward compatibility for the new DTO.

- ForwardIndexTypeTest: add testResolveCompressionType covering all 6 branches (CLP→PASS_THROUGH, CLPV2/CLPV2_ZSTD→ZSTANDARD, CLPV2_LZ4→LZ4, regular codec, field-type fallback, null fieldType) - SegmentMetadataUtilsTest: new test verifying null-value map entries clear the property vs non-null entries overwriting it - ColumnCompressionStatsInfoTest: new test covering all getters, JSON round-trip, null codec/indexes, hasDictionary, and unknown-field tolerance - StorageBreakdownInfoTest: new test covering TierInfo getters, multi-tier round-trip, empty map, and unknown-field tolerance on both types - TablesResourceTest: verify columnCompressionStats and compressionStats are null when compressionStatsEnabled is false - TableSizeResourceTest: verify columnCompressionStats is null per-segment when compressionStatsEnabled is false

…ssertion style

…rageBreakdown aggregation and dict sentinel path

…he#18470

…ssedSize coverage

…mixed-age tables When a table has segments built before and after `noDictionaryColumns` was added to the config, `TablesResource` was overwriting `columnHasDictMap` on each segment, so the last segment iterated determined the reported `hasDictionary` value. If a new raw segment happened to be processed before an old dict segment, the column would be incorrectly flagged as `hasDictionary=true`, causing the table-level `compressionStats` summary to be omitted. Fix: use `merge` with `&&` (false wins) so that once any segment in the table marks a column as raw, the per-table aggregated value is `false`. In the raw branch, use the literal `false` directly since `codec != null && uncompressedSize > 0` already proves the column has no dictionary. Also increase ZkStarter initial wait (50ms → 2000ms + 500ms retry) to avoid ZK probe failures on slow CI machines.

The 2000ms initial sleep and 50ms→500ms retry increase were added during local quickstart debugging and should not be in this PR.

@JsonProperty

…ionType NPE risk ColumnCompressionStatsInfo.isHasDictionary() was serialized by Jackson as \"isHasDictionary\" while @JsonCreator bound deserialization to \"hasDictionary\", breaking round-trip consistency. Rename getter to hasDictionary() and add @JsonProperty(\"hasDictionary\") so serialization and deserialization use the same wire key. Update all call sites. ForwardIndexHandler called compressionType.name() on the return of resolveCompressionType() without a null check, violating the method's documented contract. Wrap both call sites in null guards, matching BaseSegmentCreator's pattern.

…t SPI The method was abstract, which would break any external IndexCreationContext implementor on upgrade. Provide a default returning false (feature-off) to preserve backward compatibility for plugins and external implementations. Also document the mutable CompressionStats/StorageBreakdown inner classes as intentional accumulators separate from the immutable pinot-common DTOs.

…oveTableGauge(name, key, gauge) Tier storage gauges were emitted as setValueOfTableGauge(tableName + "." + tierKey, gauge, value), concatenating the tier key into the tableName slot. This produces a non-canonical gauge name that is invisible to any code querying gauges by table name. Use the correct 3-argument overloads setOrUpdateTableGauge(tableName, tierKey, gauge, value) and removeTableGauge(tableName, tierKey, gauge) throughout emitTierMetrics and clearTierMetrics. The _trackUncompressedSize default-flip (C4.13) is deferred: writers are used directly in tests and tooling without going through ForwardIndexCreatorFactory, so changing the default breaks those callers. The existing factory-based path already sets the flag correctly via setTrackUncompressedSize(context.isCompressionStatsEnabled()).

johnsolomonj force-pushed the feature/compression-stats-tracking branch 3 times, most recently from b9e573e to 7667a13 Compare May 12, 2026 23:13

johnsolomonj mentioned this pull request May 19, 2026

[PEP] Add compression ratio calculation and per-column compression stats #18184

Open

johnsolomonj force-pushed the feature/compression-stats-tracking branch from 86fbf8d to 88a9887 Compare May 19, 2026 14:51

jsol-splunk added 24 commits May 19, 2026 07:54

Polish TableSizeResourceTest: remove redundant queryParam and align a…

4005086

…ssertion style

Add ServerSegmentMetadataReader coverage: CompressionStatsSummary/Sto…

64ab965

…rageBreakdown aggregation and dict sentinel path

Fix missing @nullable import in ColumnMetadata after rebase onto apac…

3bcf1eb

…he#18470

Add CLPForwardIndexCreatorV2 getUncompressedSize and setTrackUncompre…

6e8e183

…ssedSize coverage

Retrigger CI

4ecf85e

johnsolomonj force-pushed the feature/compression-stats-tracking branch from 88a9887 to ad9d6da Compare May 19, 2026 15:04

Revert unrelated ZkStarter timing changes from hasDictionary fix commit

9c560a0

The 2000ms initial sleep and 50ms→500ms retry increase were added during local quickstart debugging and should not be in this PR.

johnsolomonj force-pushed the feature/compression-stats-tracking branch from ad9d6da to 9c560a0 Compare May 19, 2026 15:11

jsol-splunk added 3 commits May 19, 2026 09:19

johnsolomonj force-pushed the feature/compression-stats-tracking branch from 0bf95a3 to 0741c48 Compare May 19, 2026 19:41

xiangfu0 added release-notes Referenced by PRs that need attention when compiling the next release notes observability Related to observability (logging, tracing, metrics) feature New functionality labels May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Add compression ratio calculation and per-column compression stats (#18184)#18185

[Draft] Add compression ratio calculation and per-column compression stats (#18184)#18185
johnsolomonj wants to merge 29 commits into
apache:masterfrom
johnsolomonj:feature/compression-stats-tracking

johnsolomonj commented Apr 13, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

johnsolomonj commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design document

Key design decisions

Test plan

Uh oh!

codecov-commenter commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

johnsolomonj commented Apr 13, 2026 •

edited

Loading

codecov-commenter commented Apr 13, 2026 •

edited

Loading